NTCIR-5 WEB Navi-2 Experiments at Osaka Kyoiku University - Page, Anchor and Title Indexing, and In-link Count, Inter Page and Inter Site Link Analyses
نویسندگان
چکیده
This paper describes experimental results of WEB Navigational Retrieval Subtask 2 (WEB Navi-2). We made three gram-based indices, namely indices for text in whole page, text in title tag and text in anchor tag. Since gram-based indices are able to index all strings in target text, words that are not found in dictionaries are also indexed essentially. We used words in TITLE tag of search topics as queries. We did three kinds of link analyses, that is, in-link count and inter site and inter page link analysis. We merged score from word search for three indices and score from link analyses variously. We found that anchor text analysis was most effective for WEB Navi-2, and that it is necessary to devise merging of page and/or title score to anchor score.
منابع مشابه
NTCIR-4 WEB Experiments at Osaka Kyoiku University - Static/Dynamic Scoring Using Link Structure Analysis and Web Page Grouping
We did gram-based indexing and the retrieval with NTCIR-4 WEB task. The time required to make indices are 34.7 hours. The size of indices is 30.2Gbyte. The median of retrieval time par word is 26msec. The ranking algorithm of retrieval results is based on a traditional probabilistic model. We report on the result of gram-based indexing and the retrieval, and propose a scoring method based on li...
متن کاملExploiting Anchor Text for the Navigational Web Retrieval at NTCIR-5
In the Navigational Retrieval Subtask 2 (Navi-2) at the NTCIR-5 WEB Task, a hypothetical user knows a specific item (e.g., a product, company, and person) and requires to find one or more representative Web pages related to the item. This paper describes our system participated in the Navi-2 subtask and reports the evaluation results of our system. Our system uses three types of information obt...
متن کاملOsaka Kyoiku University at NTCIR-10 CrossLink-2: Link Filtering by Title Tag of Corpus as a Dictionary
Our group (OKSAT) submitted two types of runs named SMP and REF for every subtasks of NTCIR-10 Cross-lingual Link Discovery (CLLD). Our method uses titles in Wikipedia pages (corpus) of source language as a entries of a dictionary, so no external dictionary is required. For SMP, we aimed to discover cross-lingual links of actual Wikipedia, in other words it targets Wikipedia ground truth. For R...
متن کاملNTCIR-3 WEB Experiments at Osaka Kyoiku University - Towards Index Partitioning and Parallel Retrieval
Long gram-based indices are experimented at NTCIR-3 WEB task. To make gram-based indices, no analyses such as morphological ones are required. 2 byte characters extracted from NTCIR-3 ‘cooked’ version of WEB task corpus. The total index size is 26 Gbyte and time to make indices is about 18 hours. Median search time per word from index is 197msec. Ranking algorithm used is based on a traditional...
متن کاملVerification of Effective Retrieval Method for Anchor Text on Navigational Retrieval
We participated in NTCIR-5 WEB Navigational Retrieval Subtask(Navi-2) in order to verify the most effective retrieval method for the index of anchor texts by using a retrieval system that indexed only anchor texts instead of full texts of Web pages. We introduced retrieval methods that combine one or more of six retrieval measures: (a) anchor frequency (af), (b) reference consistency (rc), (c) ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005